Entity Extraction Without Language-Speci c Resources

نویسندگان

  • Paul McNamee
  • James May
چکیده

We describe a named-entity tagging system that requires minimal linguistic knowledge and thus may be applied to new target languages without signiicant adaptation. To maintain a language-neutral posture, the system is linguistically na ve, and in fact, reduces the tagging problem to supervised machine learning. A large number of binary features are extracted from labeled data to train classiiers and compu-tationally expensive features are eschewed. We have initially focused our attention on linear support vectors machines (SVMs); SVMs are known to work well when a large number of features is used as long as the individual vectors are sparse. We call our system SNOOD (Hop-kins APL Inductive Retargetable Named Entity Tagger).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Named Entity Extraction System and its Web extensions

In this work we describe a Named Entity Extraction system originally developed within the scope of the EU-funded FACILE project, and currently used within the CONCERTO project. The system has been rstly tested at the MUC-7 competition. The purpose of the system is to identify and classify proper names in free text. In the FACILE project these were mainly nancial news, however the system and res...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Unsupervised concept based entity extraction from scientific titles

Œis paper studies the extraction and typing of entities from titles of academic literature, in order to gain a deeper understanding of their speci€c contributions and automate the construction of a problem-solution knowledgebase. To achieve this goal, we propose an unsupervised, domain independent, two phase algorithm to extract entity mentions and type them into appropriate concepts. In the €r...

متن کامل

Annotating and Recognizing Event Modality in Text

Current results in basic Information Extraction tasks such as Named Entity Recognition or Event Extraction suggest that we are close to achieving a stage where the fundamental units for text understanding are put together; namely, predicates and their arguments. However, other layers of information, such as event modality, are essential for understanding, since the inferences derivable from fac...

متن کامل

Towards Heterogeneous Resources-Based Ambiguity Reduction of Sub-typed Geographic Named Entities

The aim of this work is to nd sub-typed Geographic Named Entities from the analysis of relations between Place Names surrounded nominal group within a speci c phrasal context in a set of textual documents. The paper presents a method involving natural language processing and heterogeneous resources like gazetteers, thesauri or ontologies. The work and the results focus a French language corpus....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002